20 research outputs found

    Towards End-to-End Acoustic Localization using Deep Learning: from Audio Signal to Source Position Coordinates

    Full text link
    This paper presents a novel approach for indoor acoustic source localization using microphone arrays and based on a Convolutional Neural Network (CNN). The proposed solution is, to the best of our knowledge, the first published work in which the CNN is designed to directly estimate the three dimensional position of an acoustic source, using the raw audio signal as the input information avoiding the use of hand crafted audio features. Given the limited amount of available localization data, we propose in this paper a training strategy based on two steps. We first train our network using semi-synthetic data, generated from close talk speech recordings, and where we simulate the time delays and distortion suffered in the signal that propagates from the source to the array of microphones. We then fine tune this network using a small amount of real data. Our experimental results show that this strategy is able to produce networks that significantly improve existing localization methods based on \textit{SRP-PHAT} strategies. In addition, our experiments show that our CNN method exhibits better resistance against varying gender of the speaker and different window sizes compared with the other methods.Comment: 18 pages, 3 figures, 8 table

    DPDnet: A Robust People Detector using Deep Learning with an Overhead Depth Camera

    Full text link
    In this paper we propose a method based on deep learning that detects multiple people from a single overhead depth image with high reliability. Our neural network, called DPDnet, is based on two fully-convolutional encoder-decoder neural blocks based on residual layers. The Main Block takes a depth image as input and generates a pixel-wise confidence map, where each detected person in the image is represented by a Gaussian-like distribution. The refinement block combines the depth image and the output from the main block, to refine the confidence map. Both blocks are simultaneously trained end-to-end using depth images and head position labels. The experimental work shows that DPDNet outperforms state-of-the-art methods, with accuracies greater than 99% in three different publicly available datasets, without retraining not fine-tuning. In addition, the computational complexity of our proposal is independent of the number of people in the scene and runs in real time using conventional GPUs

    Novel GCC-PHAT Model in Diffuse Sound Field for Microphone Array Pairwise Distance Based Calibration

    Get PDF
    We propose a novel formulation of the generalized cross correlation with phase transform (GCC-PHAT) for a pair of microphones in diffuse sound field. This formulation elucidates the links between the microphone distances and the GCC-PHAT output. Hence, it leads to a new model that enables estimation of the pairwise distances by optimizing over the distances best matching the GCC-PHAT observations. Furthermore, the relation of this model to the coherence function is elaborated along with the dependency on the signal bandwidth. The experiments conducted on real data recordings demonstrate the theories and support the effectiveness of the proposed method

    Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech

    Get PDF
    International audienceWe have applied two state-of-the-art speech synthesis techniques (unit selection and HMM-based synthesis) to the synthesis of emotional speech. A series of carefully designed perceptual tests to evaluate speech quality, emotion identification rates and emotional strength were used for the six emotions which we recorded -, , ,, , . For the HMM-based method, we evaluated spectral and source components separately and identified which components contribute to which emotion.Our analysis shows that, although the HMM method produces significantly better neutral speech, the two methods produce emotional speech of similar quality, except for emotions having context-dependent prosodic patterns. Whilst synthetic speech produced using the unit selection method has better emotional strength scores than the HMM-based method, the HMM-based method has the ability to manipulate the emotional strength. For emotions that are characterized by both spectral and prosodic components, synthetic speech using unit selection methods was more accurately identified by listeners. For emotions mainly characterized by prosodic components, HMM-based synthetic speech was more accurately identified. This finding differs from previous results regarding listener judgements of speaker similarity for neutral speech. We conclude that unit selection methods require improvements to prosodic modeling and that HMM-based methods require improvements to spectral modeling for emotional speech. Certain emotions cannot be reproduced well by either method

    Machine Learning Methods for Pipeline Surveillance Systems Based on Distributed Acoustic Sensing: A Review

    Get PDF
    There is an increasing interest in researchers and companies on the combination of Distributed Acoustic Sensing (DAS) and a Pattern Recognition System (PRS) to detect and classify potentially dangerous events that occur in areas above fiber optic cables deployed along active pipelines, aiming to construct pipeline surveillance systems. This paper presents a review of the literature in what respect to machine learning techniques applied to pipeline surveillance systems based on DAS+PRS (although its scope can also be extended to any other environment in which DAS+PRS strategies are to be used). To do so, we describe the fundamentals of the machine learning approaches when applied to DAS systems, and also do a detailed literature review of the main contributions on this topic. Additionally, this paper addresses the most common issues related to real field deployment and evaluation of DAS+PRS for pipeline threat monitoring, and intends to provide useful insights and recommendations in what respect to the design of such systems. The literature review concludes that a real field deployment of a PRS based on DAS technology is still a challenging area of research, far from being fully solved.Some authors were supported by funding from the European Research Council through Starting Grant UFINE (grant number #307441), Water JPI, the WaterWorks2014 Cofunded Call, the European Commission (Horizon 2020) through project H2020-MSCA-ITN-2016/722509-FINESSE, the Spanish Ministry of Economy and Competitiveness, the Spanish “Plan Nacional de I+D+i” through projects TEC2013-45265-R, TEC2015-71127-C2-2-R, TIN2013-47630-C2-1-R, and TIN2016-75982-C2-1-R, and the regional program SINFOTONCM: S2013/MIT-2790 funded by the “Comunidad de Madrid”. H.F.M. acknowledges funding through the FP7 ITN ICONE program, grant number #608099 funded by the European Commission. J.P.-G. acknowledges funding from the Spanish Ministry of Economy and Competivity through an FPI contract. SML acknowledges funding from the Spanish Ministry of Science and Innovation through a “Ramón y Cajal” contract.We acknowledge support by the CSIC Open Access Publication Initiative through its Unit of Information Resources for Research (URICI)

    Ferrite fuer neue EMV-Gehaeuse - Neue Materialien und Technologien fuer wirksamen passiven elektromagnetischen Schutz zur Erhoehung der Zuverlaessigkeit und Umweltvertraeglichkeit elektrotechnischer/elektronischer Systeme und Visualisierung/Sichtbarmachung elektromagnetischer Energien Forschungsbericht (Abschluss)

    Get PDF
    Available from TIB Hannover: F02B1295 / FIZ - Fachinformationszzentrum Karlsruhe / TIB - Technische InformationsbibliothekSIGLEBundesministerium fuer Bildung und Forschung, Berlin (Germany); Forschungszentrum Juelich GmbH (Germany). Projekttraeger Neue Materialien und Chemische Technologien (NMT)DEGerman

    A multi-position approach in a smart fiber-optic surveillance system for pipeline integrity threat detection

    No full text
    19 pags., 4 figs., 9 tabs. -- This article belongs to the Special Issue Pattern Recognition and ApplicationsWe present a new pipeline integrity surveillance system for long gas pipeline threat detection and classification. The system is based on distributed acoustic sensing with phase-sensitive optical time domain reflectometry (ϕ-OTDR) and pattern recognition for event classification. The proposal incorporates a multi-position approach in a Gaussian Mixture Model (GMM)-based pattern classification system which operates in a real-field scenario with a thorough experimental procedure. The objective is exploiting the availability of vibration-related data at positions nearby the one actually producing the main disturbance to improve the robustness of the trained models. The system integrates two classification tasks: (1) machine + activity identification, which identifies the machine that is working over the pipeline along with the activity being carried out, and (2) threat detection, which aims to detect suspicious threats for the pipeline integrity (independently of the activity being carried out). For the machine + activity identification mode, the multi-position approach for model training obtains better performance than the previously presented single-position approach for activities that show consistent behavior and high energy (between 6% and 11% absolute) with an overall increase of 3% absolute in the classification accuracy. For the threat detection mode, the proposed approach gets an 8% absolute reduction in the false alarm rate with an overall increase of 4.5% absolute in the classification accuracy.This research was funded by a private research project by Fluxys, Gassco and Statoil named “PIT-STOP (Early detection of Pipeline Integrity Threats using a SmarT fiber-OPtic surveillance system)”, by the Spanish Ministry of Economy and Competitiveness under projects HEIMDALUAH (TIN2016-75982-C2-1-R) and ARTEMISA (TIN2016-80939-R), by the University of AlcalĂĄ under projects ACUFANO (CCG19/IA-024) and ARGOS (CCG20/IA-043), and by the Spanish Ministry of Science, Innovation and Universities (IJCI-2017-33856 and RTI2018-097957-B-C33). The APC was funded by the Ministry of Science, Innovation and Universities of Spain grant number RTI2018- 095324-B-I00
    corecore